Continuous visual speech recognition using geometric lip-shape models and neural networks
نویسندگان
چکیده
This paper describes a new approach for automatic speechreading. First, we use efficient, but effective representation of visible speech: a geometric lipshape model. Then we present an automatic objective method to merge phonemes that appear visually similar into visemes for our speaker. In order to determine visemes, we trained SOM using the Kohonen algorithm on each phoneme extracted from our visual database. We go into the presentation of our visual speech recognition systems based on heuristics and neural networks (TDNN or JNN) trained to discriminate visual information. On a continuous spelling task, visual-alone recognition performance of about 37 % was achieved using the TDNN and about 33 % using the JNN one.
منابع مشابه
Visual speech recognition using active shape models and hidden Markov models
This paper describes a novel approach for visual speech recognition. The shape of the mouth is modelled by an Active Shape Model which is derived from the statistics of a training set and used to locate, track and parameterise the speaker’s lip movements. The extracted parameters representing the lip shape are modelled as continuous probability distributions and their temporal dependencies are ...
متن کاملImproving lip-reading performance for robust audiovisual speech recognition using DNNs
This paper presents preliminary experiments using the Kaldi toolkit [1] to investigate audiovisual speech recognition (AVSR) in noisy environments using deep neural networks (DNNs). In particular we use a single-speaker large vocabulary, continuous audiovisual speech corpus to compare the performance of visual-only, audio-only and audiovisual speech recognition. The models trained using the Kal...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامل3d Lip-tracking for Audio-visual Speech Recognition in Real Applications
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real condit...
متن کامل